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1.  Object  Recognition 


Recognizing  and  locating  objects  from  sensory  data  is  a  common  element  of  many  of 
the  tasks  that  an  intelligent  system  must  perform.  Variations  of  the  problem  arise 
in  tasks  ranging  from  visual  inspection  to  hand-eye  coordination  to  autonomous 
vehicle  localization.  In  all  of  these  domains,  the  recognition  problem  can  be  generally 
characterized  as  follows:  Given  a  set  of  object  models,  and  given  sensory  data  about 
some  environment,  find  all  the  instances  of  the  models  in  the  environment,  both 
identifying  the  existence  of  an  instance,  and  identifying  the  location  of  that  instance. 

Each  solution  to  the  recognition  problem  usually  consists  of  a  specification  of  which 
subset  of  the  sensory  data  accounts  for  the  object  instance  and  the  transformation 
needed  to  map  the  object  model  from  its  own  inherent  coordinate  frame  into  the 
sensor’s  coordinate  frame,  in  order  to  account  for  t  he  sensory  data. 

Clearly,  the  information  contained  in  the  sensory  data  can  significantly  influence 
possible  approaches  to  the  problem.  In  general,  the  data  may  come  from  any  of  a 
number  of  modalities,  including  visual,  range  and  tactile  data,  and  that  data  is  gen¬ 
erally  noisy,  partially  occluded  and  partially  spurious.  Although  other  approaches 
are  possible,  we  shall  restrict  our  attention  to  the  case  in  which  the  sensory  data, 
from  any  of  these  modalities,  can  be  processed  to  derive  measurements  about  the 
geometry  of  local  portions  of  the  object’s  boundary.  In  order  to  be  robust,  a  recogni¬ 
tion  system  must  be  able  to  deal  with  measurements  of  the  position  and  orientation 
of  a  patch  of  surface  that  are  noisy.  As  well,  the  data  may  come  from  environments 
in  which  much  of  the  data  is  spurious,  arising  from  objects  other  than  the  one  of 
interest,  and  in  which  much  of  the  object  of  interest  is  occluded,  so  that  sensory 
data  is  available  only  for  some  portions  of  the  object. 

The  problem  of  recognizing  rigid  objects  from  noisy  sensory  data  has  been  suc¬ 
cessfully  attacked  in  previous  work  by  using  a  constrained  search  approach  [Grimson 
and  Lozano-Perez  84,  87].  Empirical  investigations  have  shown  the  method  to  be 
very  effective  when  recognizing  and  localizing  isolated  objects,  but  less  effective 

when  dealing  with  occluded  objects  where  much  of  the  sensory  data  arises  from  ob-  i 

jects  other  than  the  one  of  interest.  When  clustering  techniques  such  as  the  Hough 
transform  are  used  to  isolate  likely  subspaces  of  the  search  space,  empiricial  perfor¬ 
mance  in  cluttered  scenes  improves  considerably.  In  this  note,  we  establish  formal  j 

bounds  on  the  combinatorics  of  this  approach.  Under  some  simple  assumptions,  we  ' 

show  that  the  expected  complexity  of  recognizing  isolated  objects  is  quadratic  in  i 

the  number  of  model  and  sensory  fragments,  but  that  the  expected  complexity  of  I 

recognizing  objects  in  cluttered  environments  is  exponential  in  the  size  of  the  correct  * 

interpretation.  We  also  provide  formal  bounds  on  the  efficacy  oi  using  the  Hough  ] 

transform  to  preselect  likely  subspaces,  showing  that  problem  remains  exponential,  j 

but  that  in  practical  terms,  the  size  of  the  problem  is  significantly  decreased. 

In  the  remainder  of  this  section,  we  briefly  describe  the  constrained  search  ] 

method  used  to  solve  the  recognition  problem.  In  section  2,  we  consider  the  com- 
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binatorics  of  unoccluded  objects,  obtaining  general  expressions  for  the  expected 
search.  These  results  are  extended  to  occluded  objects  in  section  3.  Specific  bounds 
relating  the  combinatorics  to  the  object  recognition  problem  are  derived  in  sections 
4  and  5.  The  impact  of  Hough  transforms  on  the  problem  are  considered  in  section 
6. 

1.1  Definition  of  a  solution 

In  more  formal  terms,  a  solution  to  the  recognition  problem  consists  of  a  triplet, 
(object;,  {(d„  ,  mh  ),(dh ,  mh ), . . .  (dik  ,mjk)},T) 

where  object,-  identifies  which  object  from  a  library  of  known  objects,  the  ( d,m ) 
pairings  are  associations  of  a  subset  of  the  sensory  data,  d,  with  model  elements,  m, 
from  object;  and  T  is  a  transformation  from  model  coordinates  to  sensor  coordi¬ 
nates  such  that  each  data  fragment  agrees  with  its  transformed  model  element,  to 
within  noise  bounds. 

Stated  in  such  general  terms,  there  are  a  variety  of  possibilities  for  specifying 
the  recognition  problem,  with  variations  in  the  types  of  models,  the  specifics  of  the 
sensor  data,  and  the  method  used  to  find  the  transformation.  In  this  article,  we  will 
restrict  attention  to  the  following  specific  case. 

•  We  will  assume  that  the  objects  are  modeled  a s  polygons  or  polyhedra,  so  that 
each  rrij  is  a  linear  segment.  The  models  need  not  be  complete,  so  that  gaps 
are  allowed. 

•  We  will  also  assume  that  the  sensory  data,  d<,  can  be  processed  to  produce 
estimates  of  the  geometry  of  linear  fragments  of  the  object’s  boundary,  either 
line  segments  in  the  case  of  two-dimensional  data,  or  planar  patches  in  the  case 
of  three-dimensional  data. 

•  We  will  assume  that  the  objects  are  rigid,  so  that  the  transformation  T  maps 
points  vm  in  model  coordinates  into  points  v,  in  sensor  coordinates  by 

Vd  =  R-Vm  +  V0 

where  ft  is  a  rotation  matrix,  and  vo  is  a  translation  vector. 

Even  in  this  case, there  are  a  variety  of  techniques  for  finding  the  solution,  most  of 
which  can  be  considered  as  different  forms  of  search.  Successful  approaches  have 
included  maximal  clique  techniques  [e.g.  Bolles  and  Cain  82,  Bolles  et.  al.  84]. 
hypothesize  and  test  methods  [e.g.  Ayache  and  Faugeras  86]  and  constrained  search 
[e.g.  Crimson  and  Lozano-Perez  84,  87].  In  this  article,  we  are  interested  in  the 
constrained  search  approach. 

1.2  Constrained  search  as  applied  to  recognition 

The  basic  idea  is  to  find  legitimate  pairings  of  data  and  model  fragments  by  a 
depth  first  search  <>f  an  interpretation  tree  (IT).  We  begin  by  associating  the  first 


data  fragment  with  the  first  model  face,  and  represent  this  by  a  node  at  the  first 
level  of  the  tree.  If  this  association  is  feasible,  we  consider  associating  the  second 
data  fragment  with  the  first  model  face,  represented  as  a  node  at  the  second  level 
of  the  tree,  which  is  a  son  of  the  first  node.  If  this  pair  of  associations  is  still 
feasible,  we  continue  downward  in  the  tree,  associating  model  faces  with  the  next 
data  fragment.  If  this  pair  of  associations  is  not  feasible,  then  we  backtrack,  and 
consider  associating  the  data  fragment  with  the  second  model  lace,  and  so  on.  Once 
we  have  considered  the  association  of  a  data  fragment  with  all  of  the  m  model  faces, 
we  also  consider  excluding  the  data  fragment  from  the  interpretation,  by  associating 
it  with  the  wild  card  (*)  or  null  branch.  Thus,  each  node  of  the  tree  describes  a 
partial  interpretation  of  the  data,  and  implicitly  contains  a  set  of  pairings  of  data 
fragments  and  model  faces.  Nodes  at  the  ith  level  of  the  tree  define  assignments 
for  the  first  i  data  fragments.  Each  node  branches  at  the  next  level  in  up  to  m  +  I 
ways,  where  m  is  the  number  of  model  faces  in  the  object.  The  last  branch  is  a  wild 
card  or  null  branch  and  has  the  effect  of  excluding  the  data  fragment  corresponding 
to  the  current  level  of  the  tree  from  the  interpretation  defined  at  that  node.  An 
example  is  shown  in  Figure  1.  With  the  inclusion  of  the  wild  card  branch,  any  node 
at  level  i  defines  a  mapping  from  a  subset  of  the  first  i  data  points  to  actual  faces 
of  the  object  model. 

Given  s  data  fragments,  any  leaf  of  the  tree  specifies  an  interpretation 
{(<*1 ,  mh ),  ( d2,mh ), . . .  ( ds ,  mjt )}  , 

where  some  of  the  rrijk  may  be  the  wild  card  character.  By  excluding  such  matches, 
the  leaf  yields  a  partial  interpretation 

),  {di2 ,  mji3 ), . . .  (dik ,  m^)} 

where  1  <  <  ii  <  . . .  <  u-,  but  these  indices  may  not  include  the  entire  set  from  1 

to  s.  This  interpretation  may  then  be  used  to  solve  for  a  rigid,  scaled  transformation 
that  maps  model  faces  into  corresponding  data  fragments,  if  such  a  transformation 
exists.  This  transformation  must  map  the  faces  so  that  both  the  position  and  the 
orientation  of  the  face  are  consistent  with  the  associated  data  point,  modulo  noise 
in  the  measurements.  Thus,  by  searching  for  leaves  of  the  tree  and  testing  that  the 
interpretation  there  yields  a  legal  transformation,  we  can  find  possible  instances  of 
object  models  in  the  data,  and  solve  the  recognition  problem. 

Because  this  search  process  is  inherently  an  exponential  problem,  the  key  to  an 
efficient  solution  is  to  use  constraints  to  remove  large  subtrees  from  consideration 
without  explicitly  having  to  explore  them,  thereby  providing  a  specific  definition  for 
the  notion  of  feasible  in  the  above  discussion.  In  [Grimson  and  Lozano-Perez  81,  87] 
we  describe  a  constrained  search  method  called  RAF  (for  Recognition  and  Attitude 
Finder),  that  uses  a  set  of  constraints  based  on  the  relative  shape  of  parts  of  objects, 
either  in  two  dimensions  or  in  three.  In  this  work,  the  object  models  and  the  sensory 
data  consist  of  linear  edge  or  face  fragments.  The  constraints  include  the  following: 

•  The  length  (area)  of  a  data  fragment  must  be  smaller  than  the  length  (area)  of 

a  corresponding  model  fragment,  up  to  some  bounded  measurement  error; 


‘2  Ami  •••  Amn  A*  *••  ^  *  “i  •••  *n  * 
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Interpretation:  ((d^m^)  (dj *)  (d^mj) 


Figure  1.  An  Interpretation  Tree.  Each  node  of  the  tree  defines  a  partial  interpretation, 
where  the  level  of  each  ancestor  defines  a  sensory  data  point,  and  the  branch  leading 
to  each  such  node  defines  the  corresponding  model  fragment.  An  example  of  a  partial 
interpretation  is  shown,  where  <f;  denotes  the  ith  data  point  and  mk  denotes  the  kth  model 
fragment.  The  *  indicates  the  wild  card  branch,  corresponding  to  the  exclusion  of  the 
associated  data  point  from  the  interpretation. 

•  The  angle  between  the  normals  to  a  pair  of  data  fragments  must  differ  from  the 
angle  between  the  normals  of  the  corresponding  model  fragments  by  no  more 
than  a  bounded  measurement  error; 

•  The  range  of  distances  between  two  data  fragments  must  lie  within  the  range 
of  distances  of  the  corresponding  model  fragments,  where  the  model  range  has 
been  expanded  to  account  for  measurement  errors; 

•  The  range  of  components  of  a  vector  spanning  the  two  data  fragments  in  the  di¬ 
rection  of  each  of  the  fragments’  normal  must  lie  within  the  corresponding  range 
of  components  for  vectors  spanning  the  model  fragments,  modulo  measurement 
error. 

•  A  data  fragment  assigned  to  the  wild  card  is  always  consistent. 

It  is  possible  to  extend  these  constraints  to  handle  the  recognition  of  curved  objects 
in  two  dimensions  [Grimson  87],  but  here  we  stay  with  linear  elements. 
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1.3  The  constraints  reduce  the  search 

Given  these  unary  and  binary  constraints,  the  constrained  search  process  consists 
of  a  depth  first  search,  with  downward  termination  based  on  constraint  consistency. 
Suppose  the  search  process  is  currently  at  some  node  at  level  k  in  the  interpretation 
tree  and  with  a  consistent  partial  interpretation  given  by 

h  =  {(d1,mjl),(d2)mi2),...(<ifc,mjt)}. 

We  now  consider  the  next  data  fragment  dk+i,  and  its  possible  assignment  to  model 
face  m,jk+ly  where  jk+ 1  varies  from  1  to  m  +  1.  This  leads  to  a  potential  new 
interpretation 

h+ 1  =  {(di,Tnjl),(d2,mh),...(dk+umjk+l)} 

The  following  rules  hold. 

•  If  +  1  is  the  wild  card  match,  then  the  new  interpretation  Ik+i  is  consistent, 

and  we  continue  downward  in  our  search. 

•  If  mjt  +  I  is  a  real  model  edge  fragment,  we  must  verify  that  the  length  constraint 
holds  for  matching  dfc+1  to  mJt+1 ,  and  that  the  angle,  distance  and  component 
constraints  hold  for  the  pairings  [(dk+i,mjk+l ), (d,-,  rrij, .)],  for  1  <  i  <  k. 

•  If  all  of  these  constraints  are  true,  then  the  new  interpretation  /*+ j  is  a  con¬ 
sistent  partial  interpretation,  and  we  continue  our  depth  first  search.  If  one  of 
them  is  false,  then  the  partial  interpretation  is  inconsistent.  In  this  case,  we 
increment  the  model  face  index  jk+i  by  1  and  try  again  with  a  new  h+i,  until 
jk+ 1  =m+l. 

If  the  search  process  is  currently  at  some  node  at  level  k  in  the  interpretation  tree, 
and  has  an  inconsistent  partial  interpretation  given  by 

h  =  {(di,mh),(d2,mh),...{dk,mjk)} 

then  it  is  in  the  process  of  backtracking.  If  =  m  +  1  (the  wild  card)  we  backtrack 
up  another  level,  otherwise  we  increment  j*.  and  continue. 

1.4  Model  tests 

Once  the  search  process  reaches  a  leaf  of  the  interpretation  tree,  we  have  accounted 
for  all  of  the  data  points.  We  are  now  ready  to  determine  if  the  interpretation  is  in 
fact  globally  valid.  To  do  this,  we  solve  for  a  rigid  transformation  mapping  points 
vm  in  model  coordinates  into  points  v,*  in  sensor  coordinates, 

Vd  =  Rvm  +  v0 

where  R  is  a  rotation  matrix,  and  v0  is  a  translation  vector.  We  can  solve  for  this 
transformation  in  a  number  of  ways  [e.g.  Crimson  and  Lozano-Perez  84,  87,  Ayarhe 
and  Fa  tiger  as  80]. 

Given  such  a  transformation,  which  is  usually  found  using  some  type  of  least 
squares  fit,  we  must  then  ensure  that  the  interpretation  actually  satisfies  it.  We  do 


this  by  considering  each  of  the  data  fragments  associated  with  a  real  model  frag¬ 
ment  in  the  interpretation,  and  transforming  the  associated  model  fragment  by  the 
computed  transform.  For  each  such  fragment,  we  then  verify  that  the  transformed 
fragment  differs  in  position  and  orientation  from  its  associated  data  fragment  by 
amounts  that  are  less  than  some  acceptable  error  bounds.  These  bounds  on  trans¬ 
form  error  can  be  obtained  from  the  predefined  bounds  on  the  sensor  error  [Crimson 
86b],  Any  interpretation  that  passes  such  a  model  test  is  a  consistent  interpretation 
of  the  data. 

1.5  Additional  search  reductions 

While  the  constrained  search  technique  described  above  will  succeed  in  finding  all 
consistent  interpretations  of  the  sensory  data,  for  a  given  object  model,  it  is  not 
particularly  efficient.  This  is  mostly  due  to  the  problem  of  segmenting  the  data  to 
determine  subsets  that  belong  to  a  single  object.  Indeed,  if  all  of  the  sensory  data 
do  belong  to  one  object,  the  described  method  is  known  to  be  quite  efficient,  as  has 
been  verified  both  empirically  [Grimson  and  Lozano-Perez  84,  87]  and  theoretically 
[Crimson  86a].  In  order  to  improve  the  efficiency  of  the  method,  we  can  add  two 
additional  methods  to  our  search  process,  both  previously  discussed  for  the  case 
of  linear  fragments  in  [Grimson  and  Lozano-Perez  87],  and  extended  to  circular 
fragments  in  [Grimson  87]. 

The  first  is  to  use  a  parameter  hashing  scheme,  such  as  a  Hough  transform,  to 
hypothesize  small  subspaces  of  the  entire  search  space  that  are  likely  to  contain  an 
interpretation  (a  more  detailed  treatment  of  the  Hough  transform  appears  in  a  later 
section).  The  second  is  to  use  a  measure  of  goodness  of  match,  such  as  the  portion  of 
the  object  perimeter  (in  2D)  or  the  object  surface  area  (in  3D)  correctly  accounted 
for  by  the  matched  sensory  data,  to  prematurely  terminate  the  search  process.  That 
is,  as  soon  as  an  interpretation  is  found  whose  value  under  that  measure  exceeds 
a  predefined  threshold,  the  search  process  is  terminated,  with  that  interpretation 
taken  as  the  correct  solution.  Both  of  these  methods  are  known  empirically  to 
considerably  reduce  the  search  needed. 

1.6  Empirical  Performance 

The  recognition  method  described  in  the  previous  sections  lias  been  tested  on  a  vari¬ 
ety  of  data,  including  two  dimensional  recognition  from  grey-level  images  [Crimson 
and  Lozano-Perez  84.  87],  and  three-dimensional  recognition  from  laser  range  data 
[Grimson  and  Lozano-Perez  87],  silhouettes  [Van  Hove  87],  stereo  data  [Porrill,  et 
al.  87],  motion  data  [Murray  87]  and  tactile  data  [Grimson  and  Lozano-Perez  84].  In 
all  of  these  cases,  the  method  typically  finds  a  unique  interpretation  quite  rapidly, 
in  the  presence  of  varying  amounts  of  sensor  noise. 

For  example,  in  [Grimson  and  Lozano-Perez  87],  we  report  on  experiments  in 
which  an  object  containing  50  model  edges  was  correctly  identified  in  scenes  contain- 
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ing  100  data  edges,  when  as  little  as  .25  percent  of  the  object  was  visible  in  the  scene. 
Over  100  different  trials,  the  median  search  effort  involved  the  exploration  of  59000 
nodes  of  the  interpretation  tree  when  using  a  Hough  transform.  In  elapsed  time, 
such  exploration  typically  took  only  a  few  seconds  on  a  Symbolics  Lisp  Machine. 


2.  The  Combinatorics  of  Isolated  Objects 


Given  that  the  RAF  recognition  technique  has  good  empirical  performance,  our  goal 
is  to  prove  that  such  empirical  observations  are  generally  valid.  We  begin  by  consid¬ 
ering  the  combinatorics  of  recognizing  isolated  objects,  that  is,  situations  in  which 
all  of  the  sensory  data  is  known  to  lie  on  a  single  object.  An  earlier  study  of  this 
problem  is  presented  in  [Grimson  86a].  In  that  study,  we  used  a  simple  model  of  the 
recognition  system  to  develop  estimates  of  the  performance  of  the  system.  In  this 
section,  we  use  a  more  complete  model  to  develop  better  bounds  on  the  performance 
of  the  system. 

Because  we  formulate  it  as  a  search  process,  our  approach  to  object  recognition 
can  be  considered  as  a  problem  of  constraint  satisfaction,  or  consistent  labeling. 
There  are  several  general  results  available  concerning  the  characteristics  of  consis¬ 
tent  labeling  techniques  [e.g.  Freuder  78,  82,  Gaschnig  79,  Haralick  and  Elliot  80, 
Haralick  and  Shapiro  79,  Mackworth  77,  Mackworth  and  Freuder  85,  Montanari 
74,  Nudel  83,  Waltz  75].  In  particular,  general  bounds  on  the  expected  number  of 
solutions,  on  the  expected  number  of  consistency  checks  performed  at  each  level  of 
the  search  tree,  and  on  the  expected  number  of  consistent  nodes  at  each  level  of  the 
tree  are  known.  We  will  use  a  specific  instance  of  the  framework  provided  by  these 
results  to  derive  explicit  bounds  on  our  version  of  the  recognition  problem. 


2.1  Model  of  consistency  -  unoccluded  case 


We  are  particularly  interested  in  bounds  on  the  number  of  interpretations  delivered 
by  the  system,  and  in  bounds  on  the  amount  of  work  performed  by  the  system,  in 
this  case  measured  as  the  number  of  nodes  of  the  search  tree  actually  explored  by 
the  system.  Since  our  method  uses  both  unary  and  binary  constraints,  we  need  to 
model  the  probability  that  adata-model  assignment  is  consistent  and  the  probability 
that  a  pair  of  data-model  assignments  are  consistent. 

We  let  qi  j  denote  the  probability  that  assigning  the  ilh  data  element  to  the  Ith 
model  element  is  consistent,  and  we  let  denote  the  probability  that  the  pair 

of  assignments  i  >— ►  / ,  j  *-*  J  is  consistent.  Our  model  of  the  recognition  problem  is 
defined  as  follows. 

For  a  single  data-model  pairing,  if  the  pairing  is  part  of  the  correct  interpre¬ 
tation,  the  probability  of  consistency  is  simply  1.  If  it  is  not  correct,  we  let  the 
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probability  of  consistency  be  pi.  Thus,  we  have 

{1  if  i  *— ►  /  is  correct 
Pt  otherwise. 


For  a  pair  of  assignments,  suppose  we  are  considering  a  match  in  which  data 
fragments  i,j  are  paired  with  model  fragments  /, ./  respectively.  We  will  model 
the  situation  by  saying  that  the  consistency  of  this  pair  of  pairs  has  probability 
1  if  these  pairings  are  part  of  the  correct  interpretation,  and  has  probability  p> 
otherwise.  Note  that  this  is  essentially  assuming  a  random  distribution  of  edges. 
It  is  also  assuming  that  pairs  of  model  edges  are  distinctive,  so  that  objects  with 
partial  symmetries  are  excluded.  Thus,  we  have 

{1  if  i  i— *■  I,j  *  J  is  correct 
p2  otherwise. 

Because  the  data  is  known  to  lie  on  a  single  objec:,  we  do  not  need  to  use  the 
wild  card  branch  of  the  search  tree,  so  that  each  node  of  the  search  tree  has  only  m 
branches  in  this  case.  Thus,  the  search  'ree  has  mk  nodes  at  level  k.  However,  not 
all  of  these  are  actually  reached  by  the  algorithm. 

In  general,  a  node  at  the  kth  level  of  the  tree,  with  assignment  1*-*  /j . Ar  •— 

Ik  has  a  probability  of  consistency: 
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2.2  Simple  bounds  on  the  problem 

We  let  rik  denote  the  number  of  consistent  nodes  at  t  he  kth  level  of  the  interpretation 
tree,  under  this  model  of  consistency.  The  expected  number  of  consistent  nodes  is 
simply  the  sum  of  the  probability  above  take  over  all  mappings.  We  are  interested 
in  bounds  on  n3 ,  the  number  of  interpretations  of  the  s  sensory  data,  fragments. 
Simple  bounds  on  the  number  of  interpretations  are  given  by  the  following  result. 
In  the  interests  of  clarity  of  presentation,  the  proof  is  deferred  to  the  appendix. 


Proposition  1:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  object  with  m  faces,  then  the  number  of  interpretations  is  bounded  by 

r 

rik  <  [1  +  (m  -  l)pip2  *  J  . 

and  by 

r  i  A:  Kt-D  k 2 

>  i  +  [pf  +  -  '  )  vT^  -  vT 

where  p\  is  the  probability  of  a  random  data-model  assignment,  satisfying  unary 
consistency,  and  p2  is  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency. | 
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This  provides  us  with  formal  bounds  on  the  number  of  Ar-interpretations.  Bounds 
on  the  number  of  nodes  explored  in  the  tree  can  be  obtained  by 

3—1 

milk, 

k=  1 

because  the  algorithm  must  look  at  each  of  the  nodes  below  a  consistent  node,  even 
if  not  all  of  these  subsidiary  nodes  are  themselves  consistent. 

In  principle,  the  bounds  on  the  number  of  ^-interpretations  are  exponentials 
in  k.  But  because  pi,p2  <  I,  we  can  see  that  as  k  increases,  the  base  of  the 
exponent  decreases.  This  suggests  that  n*  may  decrease  as  k  gets  large  enough,  but 
to  establish  this  formally,  we  need  to  relate  the  probabilities  p\,pi  to  properties  of 
the  object,  in  particular  to  m.  Before  we  do  that,  we  consider  formal  bounds  on  the 
case  of  occluded  recognition. 


3.  The  Combinatorics  of  Occluded  Objects 


We  want  to  extend  our  analysis  to  the  case  in  which  the  scene  is  cluttered,  so  that 
much  of  the  object  of  interest  may  be  occluded,  and  so  that  much  of  the  data 
obtained  may  come  from  objects  other  than  the  one  of  interest.  To  model  this, 
we  will  again  assume  the  object  has  m  faces,  that  there  are  s  sensory  fragments,  of 
which  c  actually  lie  on  the  object  to  be  recognized.  We  need  to  determine  bounds  on 
n*,  the  number  of  interpretations,  and  N",  the  number  of  nodes  of  the  interpretation 
tree  actually  examined. 


3.1  Model  of  consistency  —  occluded  case 

A  node  at  the  kth  level  of  the  tree  defines  an  ^-interpretation,  assigning  model  faces 
to  the  first  k  data  fragments.  Each  such  interpretation  can  be  specified  by  choosing 
j  (out  of  c)  of  the  data  points  lying  on  the  object  to  be  correctly  matched  to  a  model 
face,  and  choosing  r  —  j  of  the  remaining  data  points  (either  lying  on  the  object 
or  not)  to  be  incorrectly  matched,  with  the  remaining  data  points  assigned  to  the 
wild  card.  Such  an  interpretation  would  have  r  actual  matches,  and  k  —  r  wild  card 
matches.  We  denote  by  iik,r  the  number  of  such  k,  r-interpret.ations. 

We  need  to  determine  which  of  these  interpretations  are  consistent.  For  the 
unary  constraints,  any  wild  card  match  is  consistent  with  probability  1,  as  is  any 
correct  match.  The  remaining  r  —  j  incorrect  matches  each  have  probability  of 
consistency  .  Thus,  we  have 

{l  if  t  /  is  correct 

1  if  I  is  the  wild  card  character. 

P\  otherwise. 


Any  pair  of  assignments,  both  of  which  are  correct,  is  consistent  with  probability 
1.  Any  pair  of  assignments,  at  least  one  of  which  is  assigned  to  the  wild  card  also 
is  consistent  with  probability  1.  Thus,  we  have 

{1  if  i  I,j  »->  J  is  correct 

1  if  either  I  or  J  are  the  wild  card  character. 

Pi  otherwise. 

Using  this  model  of  consistency,  we  can  establish  the  following  bounds.  The 
proof  is  deferred  to  the  appendix. 


Proposition  2:  Given  an  object  with  m  faces  and  given  k  sensory  data  points, 
of  which  c  actually  lie  on  the  object,  the  number  of  interpretations  n’k  is  bounded 
by 

n'k  <2°  -  [l  +p2]c  +  [l  +  mp-[  p\ ]  k~C[pi  +  1  +  mpi p| ] ° 

+  mp,  [l  -  pf  ]  [l  +  PiY'1  [k  +  Pi(k  -  c)] 

and  by 

K  >  2C  -  [l  +  Pi^~ ] C  +  [l  +  (m  -  l)p,p2-Hfc_C[l  +  (m  -  l)pip2~5-  +  P2~5_]C 
+  px(m  -  l)[l  +  p-i]"'"1  [k  +  p2(k  -  c)] 

-  Pi(m  -  1  )pP~  [l  +  pTr~Y~X  (*  +  Pi^  (k  ~  c)] 
where  pi  is  the  probability  of  a  random  data-model  assignment  satisfying  unary 
consistency,  and  pi  is  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency.! 


As  in  the  non-occluded  case,  bounds  on  the  number  of  nodes  explored  in  the 
tree  can  be  obtained  from 

N>  =  EmnS. 

k=l 

In  order  to  make  sense  out  of  these  rather  messy  equations,  we  again  need  to 
relate  the  probabilities  of  consistency  p\,p>  to  properties  of  the  objects. 


4.  Bounding  the  probability  of  consistency 


In  the  previous  sections,  we  have  derived  bounds  on  the  problem,  as  a  function  of 
the  probability  of  consistency.  It  is  desirable,  however,  to  reduce  these  expressions 
to  ones  involving  parameters  of  the  problem,  in  particular,  to  characteristics  of 
the  object  models  and  the  sensory  data.  In  the  following  sections,  we  derive  such 
expressions,  under  some  simplifying  assumptions. 
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4.1  Consistency  in  the  two  dimensional  case 
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We  begin  with  the  probability  of  unary  consistency,  p\.  If  l  is  the  length  of  the  data 
fragment,  and  L  is  the  length  of  the  model  segment,  then  the  probability  that  this 
pairing,  made  at  random,  is  consistent  is  given  by  the  probability  that 

£<  L  +  f 


where  e  is  a  bound  on  the  error  in  measuring  the  length  of  the  data  edge.  If  we  let 
/(f)  denote  the  distribution  of  data  lengths,  and  F(L)  denote  the  distribution  of 
model  lengths,  then  the  probability  of  consistency  is  simply  given  by 


Omi; 

=  0 


min(L-f-  e,D) 


f(()F(L)d£dL 


where  D  is  the  dimension  of  the  image.  In  the  worst  case,  this  is  just  1,  which  holds, 
for  example,  when  the  model  segments  all  have  the  same  length  and  all  of  the  data 
fragments  are  smaller  than  this  length.  If  other  models  of  length  distribution  are 
chosen,  a  different  probability  can  be  derived. 


Figure  2.  A  pair  of  model  edges. 

Now  we  turn  to  the  probability  of  binary  consistency.  In  the  RAF  system,  a  pair  of 
fragments  is  characterized  by  the  relationship  between  the  fragment  normals,  and 
by  the  components  of  the  family  of  separation  vectors  between  the  fragments.  We 
first  transform  this  representation  into  a  more  convenient  one. 


Claim  1:  A  pair  of  edges  whose  relationship  is  defined  by  the  ranges  of  the 
constraints  used  in  the  RAF  system  are  equivalently  described  by  the  relative  trans¬ 
formation  need  to  align  one  with  the  other. 

Proof:  Consider  two  model  edges,  each  given  by  a  midpoint,  M,,  a  unit  tan¬ 
gent,  T,.  and  a  length  /,,,  as  shown  in  Figure  2.  We  can  characterize  the  two  edges 
by  the  relative  transformation  needed  to  transform  edge  i  into  edge/.  This  is  given 


by  the  angle  9ij  needed  to  align  the  tangent  vector  T;  with  the  tangent  vector  T^, 
and  the  translation  tjj  needed  to  shift  Mi  to  Mj.  We  must  first  show  that  such  a 
representation  is  equivalent  to  the  one  used  in  the  constrained  search  process. 


Lj  |sin0| 


Figure  3.  The  range  of  positions  for  edge  j,  given  a  component  constraint. 

Consider  edge  i.  We  are  given  a  range  [cir,  Cih]  of  values  defining  the  range  of 
possible  components  of  a  separation  vector  in  the  direction  of  the  normal  to  edge  i. 
In  general,  a  separation  vector  between  the  two  edges  is  given  by 

S(a,0)  =  M,  +  at,  -  Mj  -  0Tj 

where  a  €  [-Li/2,  Li/2]  and  0  €  [-Lj/2,  Lj/2].  Now  the  actual  range  of  compo¬ 
nents  is  given  by 

(S(a,/3),T  f)  =  (Mi  -  MjM)  -  0{'Tj,T  f) 
where  <,>  denotes  the  standard  Euclidean  inner  product.  Because  0  ranges  from 
-Lj/2  to  Lj/2, 

Cih  ~  Cit  -  Lj\  sin0j. 

Thus  any  edge  that  lies  entirely  within  the  region  shown  in  Figure  3  is  consistent 
with  this  constraint. 

Now  edge  j  must  lie  at  an  angle  9  with  respect  to  edge  i,  and  must  lie  entirely 
within  the  range  of  positions  shown  in  Figure  3.  Given  the  length  of  edge  j  and 
its  orientation  relative  to  edge  t,  this  implies  that  the  center  of  edge  j  must  lie 
somewhere  along  the  line  midway  between  the  two  bounds  shown.  But  the  same 
analysis  holds  relative  to  edge  j,  i.e.  there  is  a  range  of  distances  perpendicular  to 
it,  within  which  edge  i  must  lie.  This  is  shown  in  Figure  4.  This  implies  that  edge 
j  must  have  its  midpoint  along  line  X  such  that  edge  i  lies  inside  the  region  shown. 

As  a  consequence,  there  is  only  one  position  along  the  line  A'  such  that  the 
midpoint  of  edge  i  lies  along  the  line  Y .  Note  that  while  we  have  demonstrated  this 
geometrically,  it  can  also  be  established  algebraically. | 


This  claim  implies  that  the  angle  and  component  constraints  used  by  our  recog¬ 
nition  system  are  equivalent  to  the  specification  of  two  edges  in  terms  of  their  relative 


Figure  4.  The  range  of  positions  for  edge  i,  given  a  component  constraint. 

transformation.  Hence,  a  pair  of  model  edges  can  be  equivalently  specified  in  our 
system  in  terms  of  a  relative  transformation  (#;j,  t,y). 

The  idea  is  to  use  the  characterization  of  a  pair  by  their  relative  transformation 
to  determine  the  consistency.  Since  binary  consistency  uses  pairs  of  segments,  we 
must  relate  a  pair  of  data  edges  to  a  corresponding  pair  of  model  edges.  Suppose  we 
are  considering  the  consistency  of  matching  a  pair  of  data  edges  to  a  pair  of  model 
edges.  We  know  that  a  pair  of  model  edges  are  specified  by  their  relative  transforma¬ 
tion.  We  need  to  determine  the  set  of  relative  transformations  that  could  correspond 
to  a  pair  of  data  edges.  Note  that  this  is  not  just  the  relative  transformation  be¬ 
tween  the  two  data  edges.  Rather,  we  want  the  set  of  relative  transformations  of  the 
associated  model  edges  assigned  to  these  data  edges.  This  is  important  because  the 
problem  is  compounded  by  the  fact  that  the  data  edges  may  be  occluded,  so  that 
only  part  of  the  corresponding  model  edge  is  accounted  for,  and  by  the  fact  that  the 
data  edges  will  be  noisy.  We  assume  that  position  measurements  in  the  data  are 
accurate  to  within  ±cp  and  that  angular  measurements  are  accurate  to  within  ±ra. 

Because  we  are  only  interested  in  relative  transformations,  without  loss  of  gen¬ 
erality,  we  position  the  midpoint  of  data  edge  i  at  the  origin  of  a  coordinate  frame, 
with  its  normal  pointing  along  the  negative  y  axis.  The  position  of  the  second  data 
edge  j  relative  to  this  coordinate  frame  is  shown  in  Figure  5. 

Initially,  we  ignore  the  effects  of  noise.  Because  data  edge  j  may  be  occluded, 
the  position  of  the  midpoint  of  the  corresponding  model  edge,  if  it  were  transformed 
into  this  coordinate  frame,  would  lie  along  the  line  defined  by  the  tangent  of  edge 
j  and  the  midpoint  of  the  edge,  within  a  distance  L,~e'  of  the  midpoint  of  the 
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Figure  5.  The  relative  position  of  data  edge  j  with  respect  to  data  edge  i. 


Figure  6.  Set  of  positions  for  model  edge  center,  given  fixed  edge. 


data  edge.  When  we  allow  for  noise,  we  must,  consider  any  point  that  lies  within 
a  distance  ep  of  this  line.  This  region  of  possible  positions  for  the  midpoint  of  the 
model  edge  corresponding  to  data  edge  j  is  obtained  by  sweeping  a  ball  of  radius  cp 
along  the  line,  through  a  distance  of  length  Lj  -  (j  centered  about  the  midpoint  of 
the  edge.  This  region  is  shown  in  Figure  6. 
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Figure  7.  Set  of  possible  positions  for  relative  transformation  between  two  mode)  edges 
associated  with  a  pair  of  data  edges. 

This  region  shows  the  range  of  possible  positions  for  the  midpoint  of  the  model 
edge  corresponding  to  data  edge  j,  given  that  data  edge  i  is  fixed.  Because  the 
midpoint  lies  at  the  origin,  this  also  gives  a  set  of  relative  translations.  But  edge  i 
has  the  same  problem,  namely  that  the  centerpoint  of  the  corresponding  model  edge 
may  actually  vary  in  position.  Because  we  are  interested  in  relative  transformations, 
we  can  obtain  the  full  set  of  possible  transforms  by  sweeping  the  entire  region  shown 
in  Figure  6  over  a  distance  of  ±  along  the  x  axis,  and  then  take  the  set  of  points 
lying  within  a  distance  ep  of  this  region.  This  new  region  is  shown  in  Figure  7. 

This  analysis  implies  that  any  model  edge  pair  whose  relative  translation  com¬ 
ponent  lies  within  this  area  can  be  considered  for  consistency.  The  analysis  was 
performed  assuming  that  the  orientation  was  correctly  known.  But  the  relative  ori¬ 
entation  could  also  vary  within  ±eQ  of  the  measured  angle  6  between  the  data  edge 
normals.  For  each  value,  there  is  a  corresponding  region  of  consistent  relative  trans¬ 
lation,  which  actually  changes  shape  and  position,  with  the  center  tracing  a  helical 
path  in  this  space.  Hence,  the  volume  of  relative  transformation  space  consistent 
with  a  pair  of  data-model  pairings  is  a  skewed  extension  of  the  region  shown  in 
Figure  7.  In  the  analysis  that  follows,  however,  this  skewing  is  not  critical. 

To  estimate  the  probability  of  consistency  p,  we  need  to  know  the  probability 
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that  a  pair  of  model  faces  have  a  relative  transformation  that  falls  within  the  volume 
described  above.  To  obtain  useful  results,  we  will  assume  that  the  data  edges  are 
uniformly  distributed  in  transform  space,  so  that  the  probability  of  consistency  is 
a  function  only  of  the  relative  size  of  the  volume,  and  not  on  its  actual  position  in 
transform  space. 

To  obtain  an  expression  for  the  volume,  we  begin  with  the  area  shown  in  Fig¬ 
ure  7.  By  breaking  the  region  into  subareas,  we  find  that  the  total  area  is  given 
by 

A(9)  =  47Tfp  +  4ep[Li  -  4  +  Lj  -  lj]  +  ( L,  -  U)(Lj  -  lj) j  cos  9\ 
where  0  is  the  angle  between  the  two  edges. 

As  we  have  noted,  this  region  will  change,  as  0  varies  over  the  range  of  values 
consistent,  to  within  the  error  bounds,  with  the  measured  value,  [90  -  eo,0o  + 
ea],  Thus,  the  volume  of  transform  space  consistent  with  a  pair  of  data-model 
assignments  is 

V=  /  A(8)  d6 

J  9  —  BQ—ta 


=2ca 


47r€^,  +  4ep[Z,i  -  l  -  i  +  Lj 


<j] 


+  2|  cos  90\  sin  ea(£«  -  l i)(Lj  -  lj). 


To  get  an  estimate  of  the  expected  probability  of  consistency,  we  will  make 
some  simple  assumptions.  First,  we  will  assume  that  all  the  model  edges  have  the 
same  length  Li  =  L,  Vi.  We  will  also  assume  that  the  measured  edge  fragments  have 
at  least  some  minimum  length  h. 

Clearly,  the  worst  case  volume  occurs  for  jcos#o|  =  T  and  l{  =  lj  =  h.  In  this 
case,  we  have 

Vw  -  S€a[Tve2p  +  2ep(L  -  h)]  +  2sine0(£  -  h)2 . 

A  more  likely  case  is  one  in  which  the  data  edge  lengths  are  uniformly  and 
independently  distributed  over  the  range  [h,L].  In  this  case,  by  evaluating  the 
appropriate  integrals,  we  find  that  the  expected  volume  is  given  by 

Vu  =  8ea[7rfp  +  ev{L  -  h)]  +  |  cos  0O|  sin  c„  —  . 

If  we  also  assume  that  60  is  uniformly  distributed,  then 

( L-h )2 


Vu  =  8ca[7TCp  +  €p 


{L  -  h)}  +  sin eQ • 


Other  models  are  possible,  but  these  will  suffice  for  our  purposes. 

Now,  we  need  to  relate  two  factors,  the  relative  transformation  associated  with 
a  pair  of  model  edges,  and  the  set  of  relative  transformations  consistent  with  a  pair  of 
model  edges  that  have  been  assigned  to  a  pair  of  data  edges.  Suppose  we  consider 
some  point  t  in  relative  translation  space.  We  need  to  have  an  expression  that 
denotes  the  probability  that  a  pair  of  model  edges  is  consistent  at  t,  which  we  call 
f(t,9).  We  also  need  the  probability  that  a  pair  of  data  edges  would  be  consistent  at 
t  (or  rather  that  a  pair  of  model  edges  matched  to  this  pair  of  data  edges  would  be 
consistent  at  t).  We  will  assume  that  the  data  edges  are  uniformly  distributed  over 
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relative  transformation  space,  which  has  a  range  of  [0, 2n]  in  the  rotational  dimension 
and  which  has  a  range  of  [-D / 2,  D/2]  in  each  of  the  translation  dimensions,  where 
D  is  the  dimension  of  the  image.  In  this  case,  the  expected  probability  of  a  data 
pair  being  consistent  at  any  point  is  simply  given  by  the  relative  volumes.  If  we  let 


e*  -  i£  hm  =  - 
p  L  L 


then  the  relative  volumes  are 

v:  =  [~[>k*;}!  +  2<;<i  -  /.•)]  +  [i]! 

v,- = [v[i(e;>! + *;<i  -  »•>] + [|]! 

for  the  worst  case  and  uniform  distribution  case  respectively.  Thus,  the  expected 
consistency  is  given  by 
f2*  t  D 

I  /  t  Prob(  model  consistent)  Prob(  data  consistent)  dt  du> 

Ju= o  Jt= o 

r2ir  rd 

=V*  /  /  tf(t,u)dtdu 

J u>~  0  J t—0 

=V\ 

Because  we  assumed  that  the  model  edges  were  of  equal  length,  then 

L  =  ~ 
m 

where  P  is  the  perimeter  of  the  object.  This  finally  reduces  the  probability  of 
consistency  to 


where 


in  the  worst  case,  and 


in  the  uniform  distribution  case. 

Note  that  tc  is  a  constant  that  depends  only  on  the  error  bounds  on  the  sensor, 
the  perimeter  of  the  object,  and  the  size  of  the  image.  Unless  the  perimeter  P  is 
very  large  compared  to  the  size  of  the  image  D,  we  have  k  <  1.  If  the  sensor  error 
in  measuring  position  and  the  minimum  edge  length  are  small  relative  to  the  length 
of  the  model  edges,  then  the  constant  reduces  to 

P  / sine,. 


-[sr 

4c  o 

tt(c;)2  +2c;(i  -  h*) 

,  sin  Co , 

7 r 

’T  I 

7T 

7r(€p)2  +  fp(l  -  hm) 

sin  cQ 

t r 

+  27T2  1 

D\  2tt2  ‘ 
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Proposition  3:  Given  a  two  dimensional  object  with  m  equal  sized  edges  of 
length  L,  and  given  sensory  data  that  is  distributed  uniformly  in  transform  space 
with  a  uniform  distribution  of  lengths,  the  expected  probability  of  two  random 
data- model  pairings  being  consistent,  p^,  is  given  by 


where 


«  =  k(f*P)2  +  2e;(l  -  h *)  +  ^(1  -  h*)2  [£ 


in  the  worst  case,  and 


k  =  ku  — 


W€V2  +  «;(!-  *“)]  +  ^T-( 1  -  h*Y  Jj 


in  the  uniform  distribution  case,  and  where  ca  is  a  bound  on  the  error  in  measuring 
orientation,  ep  is  a  bound  on  the  error  in  measuring  position,  h  is  the  minimum 
length  data  edge,  e*  =  h*  =  £•,  P  is  the  perimeter  of  the  object,  and  D  is  the 
dimension  of  the  image.l 

4.2  Consistency  in  the  three  dimensional  case 

A  similar  analysis  can  be  performed  for  the  case  of  three  dimensional  recognition. 
As  in  the  two  dimensional  case,  we  use  the  angle  between  two  face  normals,  and  the 
range  of  components  between  two  faces,  in  the  direction  of  each  of  the  normals  and 
in  the  direction  of  the  cross  product  of  the  normals,  to  prune  the  search.  Here  we 
assume  for  simplicity  that  each  object  is  modeled  by  m  faces,  each  a  square  of  size 
L.  Using  methods  similar  to  those  employed  in  the  previous  section,  we  can  show 
that  the  probability  of  consistency  is  proportional  to  L3.  In  this  case,  the  surface 
area  of  the  object  S  is  related  to  the  number  of  faces  in  the  object  by  S  =  mL2 . 
Hence  we  can  obtain: 


Proposition  4:  Given  a  three  dimensional  object  with  m  equal  sized  square 
faces  of  side  L ,  and  given  sensory  data  that  is  distributed  uniformly  in  transform 
space,  the  probability  of  two  random  data-model  pairings  being  consistent  p  is 
bounded  by 


where  the  constant  is  a  dimensionless  unit  depending  on  bounds  on  the  error  in 
the  sensory  data  and  on  the  ratio  of  the  surface  area  S  to  the  size  of  the  image.f 
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5.  Specific  Bounds  on  Recognition 

The  point  of  this  analysis  is  that  we  can  relate  the  probability  of  consistency  p2 
to  properties  of  the  recognition  problem,  specifically  to  the  amount  of  sensory  error 
relative  to  the  object  parameters,  (ca,  e*,  h*),  and  the  actual  parameters  of  the  object 
itself,  (number  of  faces  m  and  perimeter  P).  We  can  now  use  this  to  establish 

particular  bounds  on  the  recognition  method. 

It 

5.1  Bounds  on  the  non-occluded  case 

We  begin  with  explicit  bounds  on  the  number  or  interpretations  obtained  in  the 
case  of  data  obtained  from  a  single  object.  In  the  appendix,  we  provide  a  proof  of 
the  following  assertion. 

Proposition  5:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  two-dimensional  object  with  m  equal  sized  edges  of  length  L,  and  the  sensory 
data  is  distributed  uniformly  in  transform  space,  with  a  uniform  length  distribution, 
then  the  number  of  A:- interpretations  is  rapidly  assymptotic  to  l.| 

This  is  not  surprising,  because  it  says  that  if  we  exclude  objects  with  symmetries 
from  consideration,  and  if  we  have  enough  data  fragments  from  a  single  object,  there 
will  only  be  one  interpretation.  On  the  other  hand  it  is  reassuring  to  see  that  the 
analysis  correctly  predicts  this  effect.  For  most  objects  and  most  sensory  error 
ranges,  the  upper  bound  rather  rapidly  approaches  1,  so  that  even  with  k  =  3,  the 
expected  number  of  interpretations  is  basically  1.  This  is  consistent  with  empirical 
data. 

For  the  amount  of  search  needed  to  find  the  interpretations,  we  show  in  the 
appendix  that  under  some  simple  assumptions  on  the  amount  of  noise,  the  search 
is  at  most  quadratic  in  the  size  of  the  problem. 

Proposition  6:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  two-dimensional  object  with  m  equal  sized  edges  of  length  L,  m  >  2,  the 
sensory  data  is  distributed  uniformly  in  transform  space,  with  a  uniform  length 
distribution,  and  if  the  noise  is  small  enough,  then  the  expected  amount  of  search 
needed  to  find  the  interpretation  is  bounded  by 

m2  <  Ns  <  m.2  +  ams 

where  a  is  a  constant  that  depends  on  the  object  characteristics  and  the  amount  of 
noise  in  the  sensory  measurements. | 
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In  the  appendix,  we  provide  a  proof  of  this,  giving  a  specific  definition  of  “small 
enough”,  and  a  specific  definition  of  the  constant  a.  In  particular,  we  note  that 
the  conditions  for  the  definition  of  “small  enough”  are  satisified  for  most  sensing 
situtations.  For  example,  if  the  relative  sensing  error  and  the  minimum  edge  length 
are  .1,  that  is,  the  error  in  determining  position  is  no  more  than  one  tenth  the  length 
of  the  model  edges,  then  so  long  as  the  perimeter  of  the  object  is  less  than  5  times 
the  dimension  of  the  image,  the  proposition  is  satisfied.  Even  when  the  error  rises 
to  .5,  the  perimeter  can  be  roughly  as  large  as  the  image  dimensions. 

Note  that  the  two  bounds  are  reasonably  close.  Also  note  that  under  the 
assumptions  of  the  analysis,  in  general,  we  need  only  explore  m(s  +  m)  nodes. 
Because  there  are  ms  possible  initial  hypotheses  for  pairing  data  edges  with  model 
edges,  this  implies  that  the  constrained  search  method  will  rapidly  converge  to  the 
correct  interpretation. 

This  analysis  has  been  performed  using  a  model  in  which  the  consistency  of  a 
pair  of  model-data  assignments  was  taken  as  1  if  the  assignment  were  correct,  and 
as  p  if  not.  This  excludes  objects  with  partial  symmetries  from  consideration.  Note 
that  we  could  amplify  our  analysis  by  generalizing  the  notion  of  consistency  to: 

{1,  if  both  assignments  are  correct 
q,  if  only  one  assignment  is  correct 
p,  if  neither  assignment  is  correct. 

For  the  case  of  three  dimensional  recognition,  a  similar  result  holds: 

Proposition  7:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  three-dimensional  object  with  m  equal  sized  edges  of  dimension  L,  m  >  2, 
the  sensory  data  is  distributed  uniformly  in  transform  space,  with  a  uniform  area 
distribution,  and  if  the  noise  is  small  enough,  then  the  number  of  interpretations  is 
assymptotic  to  1,  and  the  expected  amount  of  search  needed  to  find  the  interpreta¬ 
tion  is  bounded  by 


<  Na  <  m  m  +  -f  s 


Both  of  these  results  indicate  that  while  the  total  number  of  passible  interpre¬ 
tations  is  exponential,  namely  ms ,  the  constrained  search  method  is  quite  effective 
at  finding  the  correct  interpretation,  requiring  only  a  quadratic  amount  of  search. 
This  result  is  reflected  in  empirical  studies.  It  suggests  that  the  constraints,  even  in 
the  presence  of  sensor  noise,  are  quite  powerful.  The  analysis  has  excluded  objects 
with  symmetries,  so  that  in  practical  situations  the  amount  of  search  may  be  larger, 
but  it  is  expected  to  remain  polynomial  in  the  problem  size. 

5.2  Bounds  on  the  occluded  case 

We  can  use  similar  methods  to  reduce  the  rather  messy  expressions  we  derived 
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earlier  for  the  expected  number  of  interpretations  in  the  case  of  occluded  data.  The 
appendix  contains  a  proof  of  the  following. 


VTTV WV 
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Proposition  8:  If  co  of  the  k  sensory  measurements  lie  on  a  two-dimensional 
object  with  m  equal  sized  edges  of  length  L,  the  sensory  data  is  distributed  uniformly 
in  transform  space,  with  a  uniform  length  distribution,  and  if  the  noise  is  small 
enough,  then  the  expected  number  of  interpretations,  for  m  large,  is  bounded  by 


2C0  <  n'k 


<  2C°  +  [l  +  pi«] k  +  p\ink 


where  k  is  a  constant  the  depends  on  the  object  characteristics  and  the  amount  of 
sensor  noise,  and  pi  is  the  probability  of  a  random  data-model  assignment  satisfying 
unary  consistency.! 


The  lower  bound  is  not  as  tight  as  we  could  make  it,  but  we  will  use  this  simple 
bound  for  convenience. 

Note  that  the  bounds  in  Proposition  8  make  intuitive  sense.  Consider  the  cor¬ 
rect  interpretation,  which  involves  the  correct  assignment  of  Co  of  the  data  points. 
Not  only  will  this  assignment  lead  to  an  intepretation,  but  so  will  any  subset  of  this 
assignment.  Hence,  there  must  be  at  least  the  power  set  of  c0  possible  interpre¬ 
tations,  which  accounts  for  the  2C°  term.  Any  interpretation  of  length  1  will  also 
be  included,  because  only  pairwise  constraints  are  used  to  reduce  the  search.  This 
accounts  for  the  mk  term.  The  remaining  terms  essentially  imply  that  if  the  sensory 
error  bounds  are  large  enough,  some  additional  interpretations  will  also  be  included. 
If,  however,  the  sensory  error  bounds  are  small  enough  that  k<  1,  then  basically 
only  the  interpretations  described  above  will  be  found. 

Note,  of  course,  that  these  interpretations  involve  different  amounts  of  real 
matches.  As  discussed  in  [Grimson  and  Lozano-Perez  87],  we  can  adjust  our  recog¬ 
nition  method  to  accept  the  longest  (in  terms  of  number  of  data  points  accounted 
for)  interpretation.  This  adjustment  will  in  fact  reduce  the  overall  amount  of  search 
required,  because  the  depth  first  search  may  be  terminated  at  any  node  such  that 
even  if  all  the  nodes  below  that  point  were  to  be  correctly  matched,  the  length  of 
the  resulting  interpretation  will  be  less  than  the  best  interpretation  found  so  far. 

For  the  amount  of  search  expected  in  the  occluded  case,  we  can  use  the  above 
result  to  obtain  the  following  (a  proof  is  found  in  the  appendix). 


Proposition  9:  If  co  of  the  k  sensory  measurements  lie  on  a  two-dimensional 
object  with  m  equal  sized  edges  of  length  the  sensory  data  is  distributed  uniformly 
in  transform  space,  with  a  uniform  length  distribution,  and  if  the  noise  is  small 
enough,  then  the  expected  amount  of  search  needed  to  find  the  interpretations,  for 
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m  large,  is  bounded  by 
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where 


and  where  k  is  a  constant  the  depends  on  the  object  characteristics  and  the  amount 
of  sensor  noise,  and  pi  is  the  probability  of  a  random  data-model  assignment  satis¬ 
fying  unary  consistency.! 


We  can  see  from  this  result  that  the  introduction  of  the  wild  card  match  puts 
our  search  method  back  into  the  exponential  domain,  although  *he  amount  of  work 
is  still  considerably  less  than  the  normal  British  Museum  algorithm  search.  The 
bounds  are  not  tight,  since  we  used  a  number  of  approximations  in  deriving  them. 
Note  that  the  lower  bound  consists  of  two  terms 

m2Co+1  and  m(s  -  c0). 

Depending  on  the  actual  values  for  the  parameters,  one  of  these  two  terms  will 
dominate,  but  for  most  situations,  the  exponential  term  is  likely  to  be  the  larger. 
For  the  upper  bound,  there  are  essentially  four  different  major  terms 

to [l  +  pi k] 3  m(s  -  c0)2Co  m2(s2  -  Cq)[i  +  ~ 


Again,  depending  on  the  actual  values  of  the  parameters,  one  of  these  terms  will 
dominate.  For  example,  if  the  noise  in  the  sensory  data  is  large,  and  there  are  a 
large  number  of  spurious  measurements,  the  first  term  will  dominate.  On  the  other 
hand,  if  the  noise  is  small,  the  second  term  is  likely  to  dominate. 

Nonetheless,  the  analysis  implies  that  in  general  the  introduction  of  spurious 
data  and  the  use  of  the  wild  card  branch  in  a  constrained  search  method  forces  the 
expected  complexity  of  the  method  into  the  exponential  domain. 

A  similar  analysis  may  be  done  for  the  three-dimensional  case. 

5.3  Branch  and  bound  search 

One  way  to  decrease  the  work  involved  in  finding  an  interpretation  is  to  use  a  type 
of  branch  and  bound  search.  In  particular,  suppose  that  at  each  stage  during  the 
constrained  search,  we  keep  track  of  the  longest  (measured  in  terms  of  the  number  of 
data  points  assigned  to  non-wild  card  model  faces)  interpretation  we  have  found  so 
far.  Suppose  we  reach  a  non-leaf  node  of  the  interpretation  tree,  such  that  the  sum 
of  the  non- wild  card  matches  assigned  so  far.  plus  the  number  of  remaining  data 
points  to  consider  ( i.e.  the  remaining  levels  of  the  tree  between  the  current  point  and 


the  leaves  of  the  tree)  is  less  than  the  length  of  the  best  interpretation  so  far  found. 
In  this  case,  we  cannot  find  a  better  interpretation  below  this  point  in  the  tree,  so 
we  can  terminate  our  downward  search  and  backtrack.  In  principle,  such  a  branch 
and  bound  technique  should  reduce  the  amount  of  search  performed  in  finding  the 
best  interpretation.  We  can  place  a  bound  on  the  amount  of  search  in  this  case  by 
noting  that  in  the  best  possible  case,  we  would  discover  an  interpretation  of  length 
Co  along  the  first  branch  of  the  tree.  As  a  consequence,  the  remainder  of  the  search 
would  only  have  to  consider  a  tree  of  depth  s  —  c o-  Unfortunately,  this  does  not 
change  the  lower  bound,  only  the  upper  bound.  Hence,  to  reduce  the  search  further, 
we  need  some  additional  techniques  for  restricting  the  size  if  the  search  space.  We 
next  consider  the  use  of  Hough  transforms. 


6.  Hough  transforms 


The  analysis  in  the  previous  sections  argues  that  while  the  constrained  search  tech¬ 
nique  is  quite  effective  when  it  is  known  that  all  of  the  sensory  data  comes  from 
a  single  object,  the  expected  search  effort  is  exponential  in  the  size  of  the  correct 
interpretation  when  spurious  data  is  allowed.  This  increase  in  required  search  has 
also  been  observed  in  empirical  tests.  The  increased  cost  arises  in  part  because  the 
use  of  the  wild  card  branch  as  a  means  of  separating  real  from  spurious  data  is 
not  particularly  efficient.  One  way  to  improve  the  performance  of  our  recognition 
engine  is  to  provide  a  method  for  selecting  candidate  subspaces  of  the  search  space, 
that  are  much  smaller  than  the  full  search  space  and  that  have  a  high  likelihood 
of  containing  little  or  no  spurious  data.  In  our  experimental  work,  we  have  done 
this  using  a  Hough  transform  [e.g.  Hough  62,  Merlin  and  Farber  75,  Sklansky  78, 
Ballard  81]. 

In  brief,  we  use  the  Hough  transform  as  follows.  Each  possible  pose  of  an  object 
can  be  described  by  specifying  the  parameters  of  the  rigid  transformation  needed 
to  take  the  object  from  its  inherent  coordinate  system  into  the  sensory  coordinate 
system.  In  the  case  of  two  dimensional  data,  for  example,  a  transformati  n  can  be 
described  by  an  angle  of  rotation  and  a  two  dimensional  vector  of  translation.  Each 
transformation  can  be  represented  as  a  point  in  a  space  of  transformations,  having 
one  dimension  for  the  rotation  angle,  and  one  dimension  for  each  of  the  translation 
components.  We  tesselate  this  space  into  buckets,  using  some  predefined  spacing, 
h  $  ^  h  x » h  y . 

One  way  to  extract  candidate  subspaces  of  the  search  space  is  to  find  pairings 
of  data  and  model  segments  that  are  consistent  with  the  same  pose  of  the  object. 
Thus,  for  each  sensory  data  fragment  d,.  we  compute  the  transformation  needed 
to  align  that  fragment  with  each  of  the  model  fragments,  mj,  in  turn.  Then,  that 
pairing  (dj,  mj)  is  placed  in'o  the  tesselation  bucket  in  the  transform  space  in  which 
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the  corresponding  transform  lies.  We  do  this  for  all  pairings  of  data  and  model 
fragments.  When  completed,  each  Hough  bucket  contains  a  limited  set  of  data 
fragments,  each  of  which  is  associated  with  a  limited  set  of  model  patches.  The 
expectation  is  that  random  data-model  pairings  will  be  dispersed  in  the  tesselatod 
space,  while  the  correct  data-model  pairings  will  all  fall  within  the  same  bucket, 
because  they  correspond  tothe  same  pose  of  the  object.  Hence,  by  sorting  the 
buckets  on  the  number  of  votes  (or  pairings)  they  contain,  we  can  isolate  likely 
candidate  subspaces. 

In  the  ideal  case,  the  bucket  with  the  largest  vote  will  actually  identify  the  cor¬ 
rect  interpretation,  and  since  the  bucket  also  defines  the  associated  transformation, 
in  principle,  we  are  done.  In  practice,  however,  the  Hough  transform  is  not  sufficient 
on  its  own  for  solving  the  recognition  problem  posed  here.  There  are  several  reasons 
for  this.  The  first  is  that  in  practice  one  cannot  use  infinitesimal  sized  tesselation 
buckets.  Since  the  Hough  bucket  has  a  finite  size,  any  data-model  pairing  that  falls 
within  that  bucket  will  contribute  to  the  vote  in  that  bucket.  As  the  size  of  the 
bucket  grows,  the  difference  in  transform  between  data-model  pairs  that  will  be 
associated  together  also  grows.  This  means  that  spurious  data-model  pairings  may 
be  accidentally  grouped  together,  potentially  scoring  a  larger  vote  than  the  correct 
interpretation.  As  well,  spurious  data-model  pairings  may  be  accidentally  included 
with  the  correct  pairings,  meaning  that  additional  effort  is  needed  to  isolate  the 
correct  pairs  in  a  bucket,  in  order  to  find  the  actual  size  of  the  interpretation.  Sec¬ 
ondly,  a  data-model  pairing  will  in  general  cast  a  vote  in  several  Hough  buckets, 
not  just  a  single  one.  Error  in  the  sensory  data  will  give  rise  to  a  set  of  consistent 
transformations,  rather  than  a  single  one.  Also,  occlusion  may  cause  a  data  edge  to 
correspond  to  only  a  part  of  a  model  edge.  As  a  consequence,  there  is  a  set  of  cor¬ 
responding  transformations,  one  for  each  possible  position  of  the  smaller  data  edge 
on  the  model  edge.  This  implies  that  each  data-model  pairing  contributes  to  several 
Hough  buckets,  say  r,  so  that  the  noise  level  in  the  transform  space  is  amplified  con 
siderably.  Finally,  while  the  spurious  data-model  pairings  may  well  be  distributed  in 
the  Hough  space,  the  sheer  number  of  such  pairings  may  potentially  drown  out  the 
size  of  the  vote  in  the  correct  Hough  bucket.  For  example,  if  the  replication  factor  is 
r  as  above,  and  there  are  m  model  fragments  and  s  data  fragments,  then  there  are 
Tut:  different  pairings  of  which  c0  are  correct.  This  means  that  there  are  m.sr  -  r() 
noisy  pairings  distributed  throughout  the  Hough  space.  If  there  are  b  buckets,  then 
the  average  noise  contribution  to  a  Hough  bucket  is  — which  can  clearly  be  of 
significant  size  relative  to  co.  the  size  of  the  correct  interpretation. 

I  he  effect  of  all  this  is  that  while  the  Hough  transform  can  be  used  to  order 
candidate  subspaces,  it  is  likely  in  practical  circumstances  both  that  the  Hough 
buckets  with  the  largest  number  of  entries  may  not  contain  a  correct,  interpretation, 
and  that  a  Hough  bucket  containing  a  correct  interpretation  is  also  likely  to  have 
some  spurious  data  fragments  included  and  to  have  some  additional  model  patches 
associated  with  correct  data  fragments.  We  see  this  effect  in  running  the  RAF  system. 
Hence,  in  our  empirical  studies,  we  have  use  the  Hough  transform  to  select  candidate 
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subspaces,  ranked  in  order.  We  then  apply  the  RAF  technique  to  the  subtree  defined 
by  the  Hough  bucket,  that  is,  we  use  constrained  search  on  a  tree  whose  levels 
correspond  only  to  those  data  fragments  that  are  contained  within  the  bucket,  and 
for  each  such  fragment,  we  only  consider  those  model  fragments  associated  with  it  as 
possible  matches.  We  take  the  Hough  buckets  in  order,  applying  the  RAF  technique 
to  each  in  order,  terminating  the  search  when  a  correct  interpretation  of  sufficient 
size  is  found  within  a  bucket. 


6.1  Bounds  on  occluded  recognition,  using  Hough 


This  argument  implies  that  one  cannot  assume  that  the  data-model  pairings  defined 
by  the  contents  of  a  Hough  bucket  correspond  to  a  correct  segmentation  of  the  data 
into  elements  that  are  guaranteed  to  lie  on  the  object.  This  is  unfortunate,  since  it 
means  that  the  expected  complexity  is  still  in  the  exponential  domain.  Fortunately, 
for  practical  purposes,  the  actual  size  of  the  search  complexity  is  considerably  re¬ 
duced,  since  the  parameters  of  the  search  problem  are  also  reduced. 

We  can  demonstrate  this  as  follows.  Suppose  that  the  contents  of  a  Hough 
bucket  define  a  new  interpretation  tree,  in  which  the  number  of  model  fragments 
associated  with  a  data  fragment  is  m',  where  m'  <C  m  (as  we  have  observed  in 
practice).  Also,  suppose  that  the  probability  of  a  random  data-model  pairing  falling 
within  a  bucket  is  given  by  Pr,  so  that  the  expected  number  of  data  points  contribut¬ 
ing  to  a  Hough  bucket  containing  the  correct  interpretation  is  s'  =  c0  +  Pr(s  -  c0). 
The  bounds  on  the  amount  of  search  required  to  isolate  the  correct  interpretation 
are  given  by  the  results  of  Proposition  9,  with  s  replaced  by  s'  and  m  replaced  by 
m'.  While  the  expressions  are  still  exponential  in  form,  the  key  is  to  observe  that  the 
parameters  have  been  reduced  from  their  previous  values.  In  the  limit,  asm'-*  1 
and  s'  — *  cq,  the  bounds  tend  to 
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where 


a  =  k 


Hence,  the  expressions  remain  exponential,  but  are  tighter  than  the  previous 
ones.  In  fact,  rnucli  tighter  upper  bounds  can  be  established  in  this  case,  but  the 
key  point  is  that  the  bounds  remain  exponential.  In  practical  terms,  this  suggests 
that  for  many  problems,  the  constrained  search  approach  may  still  be  applicable, 
if  the  characteristics  of  the  problem  are  small  enough.  In  our  empirical  testing 
of  the  RAF  system,  for  example,  elapsed  times  on  the  order  of  a  few  seconds  are 
commonly  observed.  As  the  problem  size  grows,  however,  and  especially  when  the 
scenes  become  complex,  the  combinatorics  suggests  that  an  exponential  search  is 
required  and  this  suggests  that  other  techniques  are  needed  to  reduce  the  cost  of 
recognition. 


V  .V 


h-T 


ft WUIU W  111  V  W  W1WIW IM  W  W \S  JWuV*S.V.K '.K 1 A  'A' '.V.TT'.VX'  K'.V.  V 


26 


7.  Implications  of  the  combinatorics 


The  goal  of  this  paper  was  to  establish  a  theoretical  basis  in  support  of  empirical  ob¬ 
servations  of  the  utility  of  a  constrained  search  approach  to  object  recognition.  Our 
experience  with  RAF  suggested  that  when  the  sensory  data  could  be  assumed  to  all 
lie  on  a  single  object,  the  system  was  very  efficient  at  finding  correct  interpretations. 
When  spurious  data  was  introduced,  however,  the  use  of  a  wild  card  branch  as  the 
last  resort  to  remove  data  fragments  from  consideration  lead  to  a  strong  increase 
in  the  amount  of  work  required  to  find  correct  interpretations.  The  analysis  in  this 
paper  supports  this  observation,  showing  that,  under  some  simple  assumptions,  the 
expected  search  in  the  case  of  isolated  data  is  quadratic  in  the  number  of  data  frag¬ 
ments  and  the  number  of  sensory  fragments,  while  the  expected  search  in  the  case  of 
spurious  data  is  bounded  by  an  expression  that  is  general  dominated  by  the  product 
of  the  number  of  data  fragments,  the  number  of  model  fragments  and  an  exponential 
denoting  the  magnitude  of  the  power  set  of  the  correct  interpretation.  While  the 
size  of  this  bound  is  considerably  smaller  than  that  associated  with  British  Museum 
search,  it  is  still  exponential. 

To  some  extent,  these  results  are  not  surprising.  Search  methods  are  well 
known  to  be  computationally  expensive.  Indeed,  some  very  successful  approaches  to 
recognition  use  maximal  clique  techniques  to  find  the  correct  interpretations  [Bolles 
and  Cain  82,  Bolles  et  al.  84],  and  the  maximal  clique  problem  is  known  to  be  NP- 
complete.  This  simply  implies  that  as  the  characteristics  of  the  problem  domain 
grow,  such  approachs  may  lead  to  poor  solutions,  but  that  for  many  instances  of 
the  problem,  the  performance  is  acceptable. 

At  the  same  time,  however,  the  analysis  implies  that  a  general  solution  to  the 
recognition  problem  will  require  additional  methods  to  reduce  the  combinatorics. 

•  i«tss  of  methods  involves  the  use  of  measures  of  fit  to  terminate  the  search, 
i  example,  one  can  terminate  the  search  once  an  interpretation  is  found  that 
accou  s  for  some  predefined  percentage  of  the  object  model.  We  have  used  such  a 
technique  in  applying  RAF  [Grimson  and  Lozano-Perez  87],  and  have  found  that  it 
ran  signficantly  reduce  the  search  cost.  The  drawback,  of  course,  is  in  deciding  what 
constitutes  an  appropriate  measure,  and  what  constitutes  an  appropriate  threshold 
for  termination.  Depending  on  the  threshold  chosen,  such  termination  procedures 
may  run  the  danger  of  accepting  false  positives. 

A  second  approach  is  to  use  grouping  to  reduce  the  search,  and  the  analysis  in 
this  note  suggests  strong  support  for  the  importance  of  grouping  in  recognition.  If 
one  can  identify  groups  of  sensory  fragments  that  are  likely  to  have  come  from  a 
single  object,  without  exponential  cost  in  identifying  such  groups,  then  it  is  likely 
that  the  expected  cost  of  the  search  process  associated  with  recognizing  an  object  can 
be  reduced  to  practical  levels.  While  the  Hough  transform  provides  a  simple  method 
for  doing  this,  more  robust  techniques  are  also  emerging,  for  example,  [Jacobs  88], 
As  such  grouping  techniques  continue  to  develop,  the  efficiency  and  robustness  of 
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associated  recognition  methods  should  also  improve. 
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Appendix 


In  the  appendix,  we  establish  formal  proofs  for  the  results  cited  in  the  text.  We 
begin  with  bounds  on  the  number  of  interpretations,  for  data  from  a  single  object. 


Proposition  1:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  object  with  m  faces,  then  the  number  of  interpretations  nk  is  bounded  by 

nk  <  [l  +  (m  -  ljpipj-3-]  . 

and  by 

k 


e 
-p7 


nk  >  1  +  \p\  +Pi(m  -  1)]  p2 
where  p\  is  the  probability  of  a  random  data-model  assignment  satisfying  unary 
consistency,  and  p2  *s  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency.! 


Proof:  To  determine  the  number  of  nodes  of  the  tree  at  the  kth  level  of  the 
tree,  we  note  that  each  such  node  defines  a  ^-interpretation,  that  is,  an  assignment 
of  model  faces  to  the  first  k  data  fragments.  For  such  an  interpretation,  there  can 
be  i  correct  assignments,  where  i  =  0, 1, . . . ,  k.  The  i  data  points  that  are  correctly 
assigned  to  model  faces  may  be  chosen  in 


ways.  For  the  remaining  k  —  i  incorrect  assignments,  there  are  m  -  1  possible  choices 
for  the  assignment  of  each  such  incorrect  label.  By  considering  all  possible  values 
for  i,  we  see  that  there  are 


(m  -  1)*-* 


nodes  at  this  level.  We  need  to  determine  which  of  these  are  actually  consistent. 
For  each  node,  there  are  k  -  i  incorrect  assignments,  and  the  probability  that  these 
all  pass  the  unary  constraint  is 


There  are  also 


different  pairwise  constraints,  of  which 


involve  correct  pairs,  that  have  probability  of  consistency  of  1.  The  rest  of  the 
pairs  have  a  probability  of  consistency  p>.  Thus,  the  probability  of  a  node  being 
consistent  with  the  binary  constraints  is  given  by 
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Putting  this  all  together,  we  obtain 
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Note,  by  the  way,  that  if  Pi  =  Pv  =  1,  this  reduces  to 


=(m-l  +  l)k  = 

i= 0  '  ' 


which  is  the  correct  expression  for  the  total  number  of  nodes  possible  at  level  k  of 
the  tree. 

Now,  we  want  to  obtain  bounds  on  the  expression  in  equation  (1).  To  obtain 
an  upper  bound  on  the  expression,  we  can  substitute  a  smaller  exponent  for  the 
power  of  P2,  because  p2  <  1  implies  that  a  lower  exponent  will  result  in  a  larger 
expression.  In  particular,  we  have 


But  this  simplifies  to 


i=0  '  ' 


<  [l  +  (m  -  l)pip2“r"]*:. 


For  a  lower  bound,  we  can  first  expand  out  the  i  =  k  term,  and  then  replace 
the  exponent  for  p  by  a  larger  expression: 
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For  occluded  objects,  bounds  on  the  expected  number  of  interpretations  is  given 
by  the  following  result. 


Proposition  2:  Given  an  object  with  m  faces  and  given  k  sensory  data  points, 
of  which  c  actually  lie  on  the  object,  the  number  of  interpretations  n*k  is  bounded 

by 

nk  <  2C  “  [l  +  P2Y  +  [l  +  rnpip$)k  c[p2  +  1  +  mpipfY 
+  mpi  [l  -  p|  ]  [  l  +  p2]c_1  [k  +  P2(k  -  c)] 

and  by 
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where  p\  is  the  probability  of  a  random  data-rnodel  assignment  satisfying  unary 
consistency,  and  p>  is  the  probability  of  a  pair  of  random  data-model  assignments 
satisfying  binary  consistency. 


Proof: 

A  node  at  the  kth  level  of  the  tree  defines  an  A'-interpretation,  assigning  model 
faces  to  the  first  k  data  fragments.  Each  such  interpretation  can  be  specified  by 
choosing  j  (out  of  c)  of  the  data  points  lying  on  the  object  to  be  correctly  matched 
to  a  model  face,  and  choosing  r  -  j  of  the  remaining  data  points  (either  lying  on  the 
object  or  not)  to  be  incorrectly  matched,  with  the  remaining  data  points  assigned  to 
the  wild  card.  Such  an  interpretation  would  have  r  actual  matches,  and  k  —  r  wild 
card  matches.  We  denote  by  the  number  of  such  A',r-interpretations.  Note  that 
for  each  of  the  r  —  j  selections,  there  is  an  upper  bound  of  m  possible  assignments, 
and  a  lower  bound  of  m  —  1  assignments. 

We  need  to  determine  which  of  these  interpretations  are  consistent.  For  the 
unary  constraints,  any  wild  card  match  is  consistent  with  probability  1,  as  is  any 
correct  match.  The  remaining  r  —  j  incorrect  matches  each  have  probability  of 
consistency  Thus,  we  have 

{i  if  ?  > — *  /  is  correct 

1  if  /  is  the  wild  card  character, 

Pi  otherwise. 

Any  pair  of  assignments,  both  of  which  are  correct,  is  consistent  with  probability 
1.  Any  pair  of  assignments,  at  least  one  of  which  is  assigned  to  the  wild  card  also 
is  consistent  with  probability  1.  Thus,  we  have 

{1  if  ?’  i — *  / ,  7  i — »  ./  is  correct 

1  if  either  /  or  ./  are  the  wild  card  character, 

P2  otherwise. 

Hence,  to  derive  bounds  on  the  number  of  consistent  nodes,  we  need  only  con¬ 
sider  pairs  of  assignments  chosen  from  the  r  actual  matches.  There  are  (£)  such 
pairs.  Of  these,  however,  (j)  have  a  consistency  of  1.  because  they  correspond  to 
correct  matches.  Tims,  the  number  of  interpretations  of  length  r  from  k  sensory 
points  is  bounded  by 


( in  -  1 ) 


P 1  Vi 


./K 


f  inding  tight,  closed  form  ■xpressions  for  the  bounds  in  equation  (4)  is  some¬ 
what  difficult.  Instead,  we  consider  the  total  number  of  intepretations, 
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32 


We  first  consider  an  upper  bound  on  this  expression: 
k  c 


■isEE©C 

r=0  j=0  X  J/ 


We  begin  by  considering  the  sum  over  r: 

k  /,  \  k-j 


The  dominant  terms  in  this  sum  will  be  for  small  t,  because  p2  <  1,  hence  we  expand 
out  the  first  few  terms,  yielding 

k-j 


P^  +  mP\ik  -  j)p[  2  ^  +  53  l  J)mtPlP2  2 


’) 


To  get  an  upper  bound  on  this  expression,  we  need  to  replace  the  exponent  of  p2 
with  a  smaller  linear  expression  in  t,  so  that  the  above  sum  is  bounded  above  by 

k- 


p[^  +  mPl(k  -  j)p[  2  ^  ^  t  ^ mtp\p2 


±LL 


or 


P2 


-  p[  2  )  +  mpi(k  -  j)  ^  2  ^  -  P2  **  )  +  1  +  mPiP2^ 


k-j 


P2 


(5) 


^)  + 

We  can  now  consider  the  summation  over  j,  treating  each  of  the  terms  above  in 
turn.  Taking  the  first  two  terms  of  (5)  yields 


E  (‘)p!*(!)  [pP  -  pS’r)j  =  E  (')  [1  -  rii  =  -  [1  +  P2]°-  (6) 

We  can  bound  the  third  term  of  (5)  as  follows 

53  (j)  t1  +  mPift*~]*-,P2  2  ■*  ^  <  53  (j)  f1  +  mPiPhk~3pi 


j= o  ^ '  j= o 

=  [l  +  rnpxp%}k'c[p2  +  1  +  mpip\}c.  (7) 
The  final  two  terms  of  equation  (5)  become 


mpi 


E  (;)(*->«(' -p,^) 


2=0 


and  this  can  be  bounded  above  by  replacing  the  exponent  with  a  smaller  expression, 


mpx  (l  -  Pj^j  53  (j) ( *  “  j)P2- 


To  reduce  this,  we  note  that  if  we  let 


/(*)  =  (1  +  a:)”  =  £(”)*•' 

i=o  ' 


then 


df(x) 


dx 


=  n(l  +  x ) 


n  —  1 


i  =  0 
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so  that 


.  ?V  =  ii. r(  ]  +  x)'1  . 


Hence, 


( .  J(a  -  i)x‘  =  «( 1  +  ,r)n  -  nx(  1  4-  ,r)n  ’^(l  +  a:)"  J[a  +  x(a  -  n)]. 


Thus,  the  final  two  terms  of  (5)  reduce  to 


mpi  ^  1  -  pz:  y  ( I  4-  ]>i ) '  ‘  [k  +  po  (k  -  c)] . 


By  combining  equations  (G)  (8),  we  get 

n*k  <  2''  “  [l  +  /'->]'  +  [l  +  "‘P\l>V,k  '  [P2  +  1  +  mp\ P2Y 

+  mpi  [l  -  pj]  [l  +  />>]''"'  [k  +  p2{k  -  c)].  (9) 

We  can  use  a  similar  approach  to  obtain  a  lower  bound  on  the  number  of 


interoretations.  We  have 


r= 0  j—0  J 

As  before,  we  begin  with  the  summation  over  r,  which  reduces  to 

^  (' to 


Expanding  out  terms  yields 


Pi  +  (m  -  1  )p\ ( k  -  j)p\  •  + 


z(V 

4 - \ 


V) 


(m  -  1  yp\p\ 


In  this  case,  we  need  to  replace  the  exponent  for  p2  with  a  linear  expression  in  t 
which  is  greater  than  the  current  one.  because  this  will  lead  to  smaller  expressions 
in  po •  We  obtain 


P2  +  (m  -  .1  )pi  ( k  -  j)p[ 


2  2  *  +  XJ  (k  f  J)<'m  ~  1 Y Pi P2 


(fc-JLXill) 


(M  f  1 1  fc-i .  L.  .  n*-i) 

Pi  ^  +  (  m  -  1  )p,  ( l:  -  j  \  p  ,  ’  4-  [  I  4-  (  m  -  1  )pi  p^~ }  3  p^P 5 

(  >  4  1  if  *  -  1  ■  )  (  k  -  I  , 

-  {in  -  1  )p  1  ( fc  -  ./)/>,  *  -  p2  5  . 


Using  the  same  methods  as  before,  this  reduces  to 

fc  -f  1  .  fc  \  f.  fc-1  fc-  1 

>  2‘’ ~  [1  +/V  ]'  +  I1  -!-  (in  b/'i/V  j'  [l  +  (  m -  l)pipP~  +  pP~  \ 

+  P\(m  -  1  )[l  4  />_>]  '  [/.•  4-  Pil fc  -  cl] 

-  /;,(//)  -  1  )/>“[!  4-  /vH'  '  [fc  4  pP~ (k  -  c)l.  (1 
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Once  we  have  a  relationship  between  the  probability  of  consistency  and  the 
parameters  of  the  problem,  we  can  derive  specific  bounds  on  the  number  of  inter¬ 
pretations.  In  section  4  of  the  paper,  we  derive  such  relationships. 


Proposition  5:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  two-dimensional  object  with  m  equal  sized  edges  of  length  £,  and  the  sensory 
data  is  distributed  uniformly  in  transform  space,  with  a  uniform  length  distribution, 
then  the  number  of  ^-interpretations  is  assymptotic  to  1. 

Proof:  From  equations  (2)  and  (3)  we  have 

nk  <  [1  +  (m  -  ljpiPj  J 

l  1 k  Hi 

nk  >  1  +  pi  +  px(m  -  \)\  p2  ‘  -  p2‘ 

In  the  case  of  two  dimensional  recognition,  we  substitute  from  Proposition  3  to  get: 

rk-\  ^  k 


*(*-D 
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(2) 

(3) 


nk  < 


nk>  1  + 
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1  +  zrrrPi(m  _ 

mK  1 

\(-)k"(- 
\rn  J  \m 


+  pi(m 


k 
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m 

(11) 


To  establish  the  result,  we  need  to  show  that 

(1  +  axk)k  >  (1  +  ax*+1)fc+1 

for  some  k,  where  x  <  1.  This  is  equivalent  to  showing  that 

If  we  can  show  that 


1  + 


kaxk( 1  —  x) 


>  1  +  ax 


fc+i 


1  +  axk+1 

then  we  are  done,  because  the  left  hand  side  is  just  the  first  two  terms  of  the 
expanded  product.  To  establish  this,  we  simply  need  to  show  that 

k(  1  -  x)  >  (1  +  axk+i)x 
for  some  k,  but  this  is  clearly  true  if  x  <  1. 

Thus,  for  k  large  enough,  both  the  upper  and  the  lower  bounds  tend  to  l.| 

To  establish  bounds  on  the  amount  of  search  needed,  we  use  the  following: 


Proposition  6:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  two-dimensional  object  with  m  equal  sized  edges  of  length  m  >  2,  the 
sensory  data  is  distributed  uniformly  in  transform  space,  with  a  uniform  length 
distribution,  and  if  the  noise  is  small  enough,  then  the  expected  amount,  of  search 
needed  to  find  the  interpretation  is  bounded  by 

ml  <  N,  <  m2  +  a  ms 


where  a  is  a  constant  that  depends  on  the  object  characteristics  and  the  amount  of 
noise  in  the  sensory  measurements. | 

Proof:  To  get  bounds  on  the  amount  of  search  in  the  two  dimensional  case, 
recall  that  this  amount  is  given  by: 

s-l 

Ns  -  ^2  mn *• 

k=  1 

To  bound  this,  we  could  simply  find  the  largest  term  in  the  summation,  and  use  ms 
times  that  term  as  an  upper  bound,  since  there  are  s  terms  in  the  sum.  To  do  this 
explicitly,  we  first  consider  the  constant  k,  given  by  Proposition  3 

«  =  -  **)]  +  -  ft*)2  Jj 

To  ease  the  analysis,  we  will  restrict  our  attention  to  cases  in  which  k  <  1,  although 
a  similar  analysis  will  hold  for  other  cases.  To  do  this,  we  note  that  the  error  in 
determining  angles  can  be  obtained  as  a  function  of  the  error  in  determining  position, 
by  considering  the  worst  case  deviation,  which  yields  ea  =  tan-1  2c*.  Thus,  we  have: 

Claim:  If  the  perimeter  of  an  object  P,  the  dimension  of  the  image  D  and  the 
error  in  measuring  positions  relative  to  the  length  of  a  model  edge  c*  =  ^  satisfy 
the  relationship: 


then 

k  <  l.| 

This  follows  naturally  from  Proposition  3.  It  is  worth  noting  that  the  conditions 
for  this  proposition  are  satisfied  for  most  situations.  For  example,  if  the  relative 
sensing  error  and  the  minimum  edge  length  are  .1,  that  is,  the  error  in  determining 
position  is  no  more  than  one  tenth  the  length  of  the  model  edges,  then  so  long 
as  the  perimeter  of  the  object  is  le.-s  than  5  times  the  dimension  of  the  image,  the 
proposition  is  satisfied.  Even  when  the  error  rises  to  .5,  the  perimeter  can  be  roughly 
as  large  as  the  image  dimensions. 

If  the  proposition  holds,  it  is  straightforward  to  show  that  the  upper  bound 
for  r)/.  given  in  equation  (11)  is  a  maximum  for  k  =  1,  in  this  case  being  equal  to 
P\m  +  1  —  pi .  Because  there  are  roughly  s  terms  in  the  summation,  this  leads  to 
the  bound 

•V,  <  m  2s. 

(Note  that  since  p,  is  generally  a  constant,  independent  of  m.  using  the  upper  bound 
of  pi  <  1  does  not  radically  change  the  derived  bound.)  We  can  improve  on  this. 


■.'■I.Mi'-li'.hMr.HVr.hVi: 
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however,  by  noting  that  if  k  <  1,  then,  from  equation  (11), 

ni  <  pirn  +  1  -  pi 

n2  <  [1  +  Pi*]2 


"3S  i1  +  f,‘w] 

We  want  to  show  that  under  the  conditions  of  Proposition  6,  the  upper  bound  on  nk 
is  monotically  decreasing.  By  taking  the  derivative  of  this  expression  with  respect 
to  k  and  considering  the  worst  case,  in  which  k  =  l,pi  =  1,  we  need  to  establish 
that 

(1  +  m2~k)  log(  1  +  m2~k)  +  m2~kk  log  —  <  0. 

For  k  >  3,  we  can  approximate  the  first  log  by  its  second  argument,  so  that  we  need 
to  establish  that 

1  +  m2~k  <  fclogm. 

Since  the  left  hand  side  decreases  with  increasing  k  and  the  right  hand  side  increases 
with  increasing  k,  we  need  only  establish  this  for  k  =  3.  This  expression  holds  for 
k  =  3  if  m  >  1.698,  which  is  trivial  to  assume.  Hence,  the  expression  is  monotically 
decreasing  for  k  >  3,  and  case  analysis  shows  this  also  holds  for  k  =  1,2.  Hence,  for 
large  m,s  we  have 

N,  <  pim2  +  (1  +  pI«)2ms  +  m(l  -  pi).  (12) 

Note  that  if  a  tighter  constant  is  desired,  we  can  expand  out  several  more  terms  in 
the  summation,  before  bounding  the  remainder. 

Similarly,  the  lower  bound  on  nk  given  in  equation  (11)  is  a  maximum  for  k  =  1, 
having  the  value  m,  provided  k  <  1.  Thus 

Na  >  pim2  +  m(l  -  pi).  (13) 

If  we  simply  let  pi  =  1,  we  establish  the  proposition.! 

For  the  three-dimensional  case,  we  have  a  similar  argument. 

Proposition  7:  If  all  of  the  k  sensory  measurements  are  known  to  lie  on  a 
single  three-dimensional  object  with  m  equal  sized  edges  of  dimension  L,  m  >  2, 
the  sensory  data  is  distributed  uniformly  in  transform  space,  with  a  uniform  area 
distribution,  and  if  the  noise  is  small  enough,  then  the  number  of  interpretations  is 
assymptotic  to  1,  and  the  expected  amount  of  search  needed  to  find  the  interpreta¬ 
tion  is  bounded  by 


m2  <  Na  <  m  m  +  -f  2.  +  s  . 


Proof: 

For  the  case  of  three  dimensional  recognition,  we  substitute  from  Proposition 
4  into  equations  (2)  and  (3),  to  get: 


+  rn  -  1 


:  k(k- 1) 

I) 

m  « 


<nk< 


(m  -  1)k*' 


For  k  =  1,  we  have 


for  k  =  2,  we  have 


m  <  «i  <  *n 


o  1  1  K3  K3 

l  +  KgmT - r+—  ~  -3 

in  t  ?n  ?  J  m 


1  K3 

<  n2  <  1  +  K3™1 - T 

m  r 


and  for  fc  =  3,  we  have 


j  1  1  k3  13  „  r  ,  ,/  1  1 

1  +  k3  — j - r  d — t  rr  <  ”3  <  1  +  K3l  1  3 

.  m  ?  m?  m  t  m .  \  m '  m  2 

Again,  as  fc  continues  to  increase,  we  have 

nk  =>  1. 

As  before,  wo  can  substitute  to  obtain  the  desired  expressions.! 


For  the  case  of  occluded  objects,  we  can  use  equations  (9;  and  (10): 
n’k  <  2C  -  [l  +po]c  +  [l  +  mpip%]k~c[p2  +  1  +  rnpxp\)c 

+  mpi[l  -  p%][l  +  p2}c  1  [k  +  P2{k  -  c)]  (9) 

K  >  2C  -  [l  +  tvH0  +  [l  +(m  -  l)p1p2^rL]'‘"_C[l  +  (m-  l)pip^~  +  P?~ ]° 

+  pi(m  -  l)[l  +  p>]c  1  {k  +  p2(k  -  c)) 


Pi(m  -  1  )pP~  [ l  +  pP~ ] C  [k  +  p^ (k  -  c)]. 


Two  dimensional  case 

To  relate  these  bounds  on  the  number  of  interpretations  to  characteristics  of  the 
objects,  we  substitute  from  (11).  This  gives 

T  .2  I  >•  I-  k2  c 

nfc  ^  2C  —  1  H — —  +  [l  +  KPi]  1  +  KPi  +  ~2 


K  K' 

+  mp\  1 - 1-1 - 

in  III 


-t]  U  +  ^(k  -  c) 
n-J  m- 


nmk.>  2C  -  1  +  — 

K  ~  \  rn  , 


+  l  +  ( in  -  1  )p  i  I  — 


l  +  (”'-1)','U)  +  U, 


+  P\(m  -  I )  (  l  -f  — r  )  (  k  +  —(k  -  c) 

\  m-  I  \  m 


-  P i(n>  ~  1)| 


,  ,  k  i"|  r—  1  r  /  \  *-c 
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To  reduce  this  to  a  more  manageable  form,  we  will  assume  that  the  conditions 
of  Proposition  6  hold.  If  m  is  also  large,  then  this  rather  messy  bound  reduces  to 
the  following. 


Proposition  8:  If  Co  of  the  k  sensory  measurements  lie  on  a  two-dimensional 
object  with  m  equal  sized  edges  of  length  L,  the  sensory  data  is  distributed  uniformly 
in  transform  space,  with  a  uniform  length  distribution,  and  if  the  noise  is  small 
enough,  then  the  expected  number  of  interpretations,  for  m  large,  is  bounded  by 


2C°  <nmk<  2C°  +  [l  +pi«]* 


,2  Vo 


1  +  — J 
mi 


Now  we  turn  to  the  problem  of  bounding  the  amount  of  search  required  in  this 
case.  We  establish  the  following  claim. 


Proposition  9:  If  co  of  the  k  sensory  measurements  lie  on  a  two-dimensional 
ob  ject  with  m  equal  sized  edges  of  length  L ,  the  sensory  data  is  distributed  uniformly 
in  transform  space,  with  a  uniform  length  distribution,  and  if  the  noise  is  small 
enough,  then  the  expected  amount  of  search  needed  to  find  the  interpretations,  for 
rn  large,  is  bounded  by 


N ;  <  m 


[1  +  pt/t]J  -  [1  +  pi  it] 
pXK 


+  2Co[s  -  c0  -f  1]  -  2 
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p'm{1  ~  £)  [? + [1 + «r  [(a)  -  (2)  +  0(J(i +„)']]. 
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where 


Proof: 


The  upper  bound  on  the  search  is  given  by: 
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The  second  term  is  simply  a  geometric  series,  and  is  easily  reduced  to  closed  form. 
To  obtain  explicit  bounds  on  the  other  terms  of  the  summation,  however,  we  need  to 
know  something  about  the  subset  of  the  data  fragments  that  are  part  of  the  correct 
interpretation,  that  is,  we  need  to  know  how  c  changes  with  k. 
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In  general, 


c(h)  = 


0,  k  <  i\ 

1,  i’i  <  k  <  k,2 

C-Q,  2 co  <  k  <  s  ~  1 


in  which  case 


J2^c(k)  =  (*'i  -  1)2°  +  (h  -  ii)21  +  ...  +  (ie 0  -  »c0-i )2C°-1  +  (s  -  1  -  iCo )2C° . 

/c=l 

The  worst  case  for  this  sum  is  when  ij  =  j,  in  which  case,  the  sum  reduces  to 

co-l 

(3  —  e0  —  1  )2C°  +  Y  2‘ 

i  =  0 

(5  -  c0  -  I )2l'°  +  2C°  -  2 
2c°{s  -  co  +  1]  -  2. 

Now  consider  the  term 


1  ”  m  )P'mYlk[l  +  al 


By  the  above  assumption  about  c(k)  the  summation  part  of  this  becomes 

Y  Ml  +  a]0  +  Y  Ml  +  a]1  +  •  ■  •  Y  Ml  +  «]C0_1  +  [1  +  «]eo  Y  k 

k=l  k=i i  fc= b0-i  k='c0 

and  again  the  worst  case  is  when  ij  =  j,  in  which  case,  the  sum  reduces  to 

Co  —  1  5—1 

£*[l  +  a]fc-,+[l+a]"'°  Y  k 

k=l  k='c0 

L  V  /  V  -  /  J  A._, 

To  bound  the  remaining  summation,  we  can  use  the  arithmetico-geometric  progres¬ 
sion: 

"£>  +  krtf  =  +  - £^>. 

f— f  1  -  a  ( 1  -  a  V 


In  our  case  we  have  a  =  0,r  =  1  and  q  -  1  +  a.  so  that 


Y  M1  + 


1  -  [1  +  o]^-'  ,  (Co  -  l)[l+o] 

n\  =  - , - +  - 

o-  a 


+  (o(<0  -  1)  -  I  )[1  +  o]1'0-1 
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This  yields: 


<v;  <  m ( ll±H^l  L—J^n  +  2c»[s  -  Co  +  1]  -  2 

V  Pi K 


—  ft  +  ptjc]  ,  f] 

>1  K 

+  Pim(l-— )  -4  +  [1  +  a]c° 

\  m )  [a2 

For  the  lower  bound,  we  have 
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(3\  _  (c°\  ,  <*(cq-  i)  —  i'll  A 
\2/  V  2  /  +  a2(l  +  a)  JJ/ 


A;=l 


Here,  the  worst  case  occurs  when  c  is  0  for  the  first  s  —  c0  -  1  terms,  and  then 
increases  linearly,  yielding 
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